STA4173 Lecture 4, Summer 2023
We have now learned one- and two-sample t-tests.
Recall, when we have two samples, they can be independent samples or dependent samples.
Independent samples: two-sample t-test
Dependent samples: paired t-test (one-sample t-test on difference)
Today we will discuss how to assess the assumptions on t-tests.
All t-tests assume approximate normality of the data.
In the case of one-sample t-tests, the measure of interest must somewhat follow a normal distribution.
In the case of two-sample t-tests, the measure of interest in each group must somewhat follow a normal distribution.
Note that a paired t-test is technically a one-sample t-test, so we will examine normality of the difference.
There are formal tests for normality (see article here), however, we will not use them.
Instead, we will assess normality using a quantile-quantile (q-q) plot.
This is a scatterplot that will form a 45° line if the assumed distribution is correct.
We will create q-q plots for:
The measurements in the case of the one-sample t-test.
The measurements from each group in the case of the two-sample t-test.
The difference between the groups in the case of the paired t-test.
ggplot2 in tidyverse.dataset %>% # pipe in data
ggplot(aes(sample = [variable])) + # call ggplot(), specify the variable to be examined
stat_qq(size=3) + # request q-q scatterplot
stat_qq_line() + # request q-q 45* line
theme_bw() + # change background :)
labs(x = "Theoretical"
y = "Sample") # change axis labels
theme(text = element_text(size=14)) # change font sizeoutcome <- rnorm(10) # simulate from N(0, 1)
sim_normal <- tibble(outcome)
sim_normal %>% # pipe in data
ggplot(aes(sample = outcome)) + # call ggplot(), specify the variable to be examined
stat_qq(size=1.5) + # request q-q scatterplot
stat_qq_line() + # request q-q 45* line
theme_bw() + # change background :)
labs(x = "Theoretical",
y = "Sample") + # change axis labels
theme(text = element_text(size=14)) # change font sizeRecall the space/earth rat example for the two-sample t-test.
space_rbc <- c(8.59, 6.87, 7.00, 8.64, 7.89, 8.80, 7.43, 9.79, 9.30, 7.21, 6.85, 8.03, 6.39, 7.54)
space <- tibble(space_rbc)
space_qq <- space %>%
ggplot(aes(sample = space_rbc)) +
stat_qq(size=1.5) +
stat_qq_line() +
labs(x = "Theoretical",
y = "Sample",
title = "Space Rats") +
ylim(5, 11) +
theme_bw() +
theme(text = element_text(size=14))Recall the space/earth rat example for the two-sample t-test.
earth_rbc <- c(8.65, 7.62, 7.33, 6.99, 7.44, 8.58, 8.40, 8.55, 9.88, 9.66, 8.70, 9.94, 7.14, 9.14)
earth <- tibble(earth_rbc)
earth_qq <- earth %>%
ggplot(aes(sample = earth_rbc)) +
stat_qq(size=1.5) +
stat_qq_line() +
labs(x = "Theoretical",
y = "Sample",
title = "Earth Rats") +
ylim(5, 11) +
theme_bw() +
theme(text = element_text(size=14))g1 <- c(17.6, 20.2, 19.5, 11.3, 13.0, 16.3, 15.3, 16.2, 12.2, 14.8, 21.3, 22.1, 16.9, 17.6, 18.4)
g2 <- c(17.3, 19.1, 18.4, 11.5, 12.7, 15.8, 14.9, 15.3, 12.0, 14.2, 21.0, 21.0, 16.1, 16.7, 17.5)
garage <- tibble(g1, g2) %>% # create dataset with both garages
mutate(d = g1-g2) # create variable of differences
garage %>%
ggplot(aes(sample = d)) +
stat_qq(size=1.5) +
stat_qq_line() +
labs(x = "Theoretical",
y = "Sample") +
#ylim(5, 11) +
theme_bw() +
theme(text = element_text(size=14))When doing the two-sample t-test, there is an assumption of equal variance.
We will formally test using the folded F test,
We will use the var.test() function.
t.test() function.rbc <- c(8.59, 8.64, 7.43, 7.21, 6.39, 6.87, 7.89,
9.79, 6.85, 7.54, 7.00, 8.80, 9.30, 8.03,
8.65, 6.99, 8.40, 9.66, 7.14, 7.62, 7.44,
8.55, 8.70, 9.14, 7.33, 8.58, 9.88, 9.94) # enter blood cell mass
rat <- c(rep("Space",14), rep("Earth",14)) # enter identifier
data <- tibble(rat, rbc) # create dataset
var.test(rbc ~ rat, data = data)
F test to compare two variances
data: rbc by rat
F = 0.97659, num df = 13, denom df = 13, p-value = 0.9666
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3135073 3.0421015
sample estimates:
ratio of variances
0.9765864
Hypotheses
Test Statistic and p-Value
Conclusion/Interpretation
Fail to reject H_0.
There is not sufficient evidence to suggest that the variances are different between earth and space rats.
What happens if we break an assumption?
If we break the normality assumption \to use a nonparametric method.
If we break the variance assumption of the two-sample t-test \to use Satterthwaite’s approximation.
This estimates the df of the t distribution. \text{df}=\frac{ \left( \frac{s^2_1}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}
The good news: R assumes unequal variances in t.test() :)
Important note!!
We just discussed assumptions on t-tests
Dependent / paired t-test: normality
Independent two-sample t-test: normality and variance
If we break the variance assumption with the two-sample t-test, there is an alternative version of the t-test.
If we break the normailty assumption, we must look to nonparametric methods.
The t-tests we have already learned are considered parametric methods.
Nonparametric methods do not have distributional assumptions.
Why don’t we always use nonparametric methods?
They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.
They discard useful information :(
In the nonparametric tests we will be learning, the data will be ranked.
Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8
Our first step is to reorder the data:x: \ 1, 2, 6, 7, 8, 10
Then, we replace with the ranks:R: \ 1, 2, 3, 4, 5, 6
What if all data values are not unique?
For example, x: \ 9, 8, 8, 0, 3, 4, 4, 8
Let’s reorder:x: \ 0, 3, 4, 4, 8, 8, 8, 9
Rank ignoring ties:R: \ 1, 2, 3, 4, 5, 6, 7, 8
Now, the final rank:R: \ 1, 2, 3.5, 3.5, 6, 6, 6, 8
Wilcoxon Rank Sum / Mann-Whitney U
wilcox.test() function to perform the test,Like before, R will use the group that is “first” in the grouping variable.
When exposed to an infection, a person typically develops antibodies. The extent to which the antibodies respond can be measured by looking at a person’s titer, which is a measure of the number of antibodies present. The higher the titer is, the more antibodies that are present.
The following data represent the titers of 11 ill people and 11 healthy people exposed to the tularemia virus in Vermont.
Is the level of titer in the ill group greater than the level of titer in the healthy group? Use the \alpha = 0.1 level of significance.
Is this independent or dependent data?
From the problem statement: The following data represent the titers of 11 ill people and 11 healthy people exposed to the tularemia virus in Vermont.
Is this independent or dependent data?
From the problem statement: The following data represent the titers of 11 ill people and 11 healthy people exposed to the tularemia virus in Vermont.
outcome <- c(640, 160, 1280, 320, 80, 640, 640, 160, 1280, 640, 160,
10, 320, 160, 160, 320, 320, 10, 320, 320, 80, 640)
group <- c(rep("ill",11), rep("healthy",11))
data <- tibble(outcome, group)
wilcox.test(outcome ~ group,
data = data,
exact = FALSE,
alternative="less")
Wilcoxon rank sum test with continuity correction
data: outcome by group
W = 35, p-value = 0.04657
alternative hypothesis: true location shift is less than 0
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Reject H_0.
There is sufficient evidence to suggest that the level of titer in the ill group is greater than the level of titer in the healthy group.
Recall the tapeworm in sheep example from the two-sample t-test lecture.
A random sample of 24 worm-infested lambs of approximately the same age and health was randomly divided into two groups.
Twelve of the lambs were injected with medication and the remaining 12 were left untreated.
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Fail to reject H_0.
There is not sufficient evidence to suggest that untreated lambs have a mean tapeworm count that is more than five units greater than the mean count for the treated lambs.
Today we have discussed that we turn to nonparametric tests when we do not meet distributional assumptions for t-tests.
If we do not meet the normality assumption for the paired t-test,
Now we will learn the Wilcoxon signed rank, the nonparametric alternative to the dependent t-test.
Like in the dependent t-test, we will analyze the difference between two values.
Like in the Wilcoxon rank sum, we will be analyzing ranks.
Before ranking, we will find the difference between the paired observations and eliminate any 0 differences.
Note 1: elimniating 0 differences is the big difference between the other tests!
Note 2: because we are eliminating 0 differences, this means that our sample size will update to the number of pairs with a non-0 difference.
When ranking, we the differences are ranked based on the absolute value of the difference.
We also keep the sign of the difference.
| X | Y | D | |D| | Rank |
|---|---|---|---|---|
| 5 | 8 | -3 | 3 | - 1.5 |
| 8 | 5 | 3 | 3 | + 1.5 |
| 4 | 4 | 0 | 0 | ——— |
Wilcoxon Signed Rank
where
wilcox.test() function to perform the test,One important variable to consider in trading stock is the daily volume. Volume is measured in number of shares traded in the stock. Stocks with lower volume tend to have more variability in the stock price.
A stock analyst believes the median number of shares traded in Walgreens Boots Alliance (WBA) stock is greater than that in McDonald’s (MCD).
Because national news can play a role in volume of stock traded, the analyst records the volume (in millions of shares) for each of the two stocks on the same day for 14 randomly selected trading days. Test the analyst’s belief at the \alpha=0.05 level of significance.
The median for the WBA stock is 6.3 while the median for MCD stock is 5.6.
The median for the difference between WBA and MCD is 0.55.
Let’s now perform the hypothesis test,
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Fail to reject H_0.
There is not sufficient evidence to suggest that the median stock shares traded is greater for WBA than for MCD.
Recall the insurance data we examined with the paired t-test.
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Reject H_0.
There is sufficient evidence to suggest that the median repair estimate from garage I is higher than that of garage II.